Random matrix ensembles of time-lagged correlation matrices: Derivation of 
eigenvalue spectra and analysis of financial time-series 
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We derive the exact form of the eigenvalue spectra of correlation matrices derived from a 
set of time-shifted, finite Brownian random walks (time-series). These matrices can be seen as 
random, real, asymmetric matrices with a special structure superimposed due to the time-shift. 
We demonstrate that the associated eigenvalue spectrum is circular symmetric in the complex 
plane for large matrices. This fact allows us to exactly compute the eigenvalue density via an 
inverse Abel-transform of the density of the symmetrized problem. We demonstrate the validity 
of this approach by numerically computing eigenvalue spectra of lagged correlation matrices based 
on uncorrelated, Gaussian distributed time-series. We then compare our theoretical findings with 
eigenvalue densities obtained from actual high frequency (5 min) data of the S&P500 and discuss 
the observed deviations. We identify various non-trivial, non-random patterns and find asymmetric 
dependencies associated with eigenvalues departing strongly from the Gaussian prediction in the 
imaginary part. For the same time-series, with the market contribution removed, we observe strong 
clustering of stocks, i.e. causal sectors. We finally comment on the time-stability of the observed 
patterns. 

PACS; 02.50.-r, 02.10.Yn, 89.65.Gh, 05.45.Tp, 05.40.-a, 24.60.-k, 87.f0.+e 



INTRODUCTION 



One of the pillars of contemporary theory of finan- 
cial economics is the notion of correlation matrices of 
timeseries of financial instruments; the capital asset pric- 
ing model U and Markowitz portfolio theory proba- 
bly being the most prominent examples. Recent empir- 
ical analyses on the detailed structure of financial cor- 
relation matrices have shown that there exist remark- 
able deviations from predictions that would be expected 
from the efhcient market hypothesis. In particular, based 
on pioneering work 0, Q , eigenvalue spectra of empiri- 
cal equal-time covariance matrices have been analyzed 
and compared to predictions of eigenvalue densities for 
Gaussian-randomness obtained from random matrix the- 
ory (RMT). It has been shown, that the eigenvectors 
which strongly depart from the spectrum obtained by 
RMT contain information about sector organization of 
markets 5, 6]. The largest eigenvalue has been identi- 
fied as the 'market-mode', and it has been pointed out 
that a 'cleaning' of the original correlation matrices by 
removing the noise part of the spectrum explainable by 
RMT results in an improved mean variance efhcient fron- 
tier which seems to be much more adequate than the one 
obtained by Markowitz (see e.g. the recent discussion in 
0). Further, RMT provides an almost full understand- 
ing of why the Markovitz approach is close to useless 
(dominance of small eigenvalues which lie in the noise 



regime) in actual portfolio management. 

Initially, RMT has been proposed to explain energy 
spectra of complicated nuclei half a century ago. In its 
simplest form, a random matrix ensemble is an ensemble 
of N X N matrices M whose entries Alij are uncorrelated 
iid random variables, and whose distribution is given by 



P(M) - exp 



Tr(MM'^) 



(1) 
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where /? takes specific values for different ensembles of 
matrices (e.g. depending on whether or not the random 
variables are complex- or real- valued) . Eigenvalue spec- 
tra and correlations of eigenvalues in the limit N oo 
have been worked out for symmetric N x N random ma- 
trices by Wigner For real valued matrix entries, such 
symmetric random matrices are sometimes referred to as 
the Gaussian orthogonal ensemble (GOE). 

The symmetry constraint has later been relaxed by 
Ginibre and the probability distributions of different en- 
sembles (real, complex, quaternion) - known as Ginibre 
ensembles (GinOE, GinlJE, GinSE) - have been derived 
in the limit of infinite matrix size. For ensembles of 
random real asymmetric matrices (GinOE) - the most 
difhcult case - progress has only slowly been made under 
great efforts over the past decades. The eigenvalue den- 
sity could finally be derived via different methods [Tolllll |. 
where - quite remarkably - the finite-size dependence of 
the ensemble has also been elucidated For recent 

progress in the field also see . 

However, these developments in RMT do not yet take 
into account the timeseries character of financial appli- 
cations, i.e. the fact, that one deals - in general - with 
(lagged) covariance matrices stemming from finite rect- 
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angular NxT data matrices X, which contain data for N 
different assets (or instruments) at T observation points. 
The matrix ensemble corresponding to the N x N co- 
variance matrix C ~ XX"^ of such data is known as the 
Wishart ensemble .13j and is a cornerstone of multivari- 
ate data analysis. For the case of uncorrelated Gaussian 
distributed data, the exact solution to the eigenvalue- 
spectrum of XX"^ is known as Marcenko-Pastur law (for 
N oo) and has been used as a starting point for ran- 
dom matrix analysis of correlation matrices at lag zero 
H, IE H) IE ■ Moreover a quite general methodol- 
ogy of extracting meaningful correlations between vari- 
ables has been discussed based on a generalization of 
the Marcenko-Pastur distribution The underlying 

method was the powerful tool of singular- value decompo- 
sition and RMT was used to predict singular-value spec- 
tra of Gaussian randomness. 

The time-lagged analogon to the covariance matrix 
is defined as C*-' ~ f't'^^t-T^ where one timeseries is 
shifted by t timesteps with respect to the other. In con- 
trast to (real-valued) equal-time correlation matrices of 
the Wishart ensemble, which have a real eigenvalue spec- 
trum, the spectrum of C,- is defined in the complex plane 
since matrices of these type are in general asymmetric. 
While the complex spectrum of C,- remains unknown so 
far, results for symmetrized la gged correlation matrices 
have been reported recently [islllGj . In [l^, it was also 
shown that the methodology of free random variables 
can be used to tackle a variety of correlated (symmet- 
ric) Wishart matrix models. 

However, it is the analysis of the initial asymmetric 
time-lagged correlations which forms a fundamental part 
of finance and econometrics, and which has attracted 
considerable attention in the respective literature. The 
existence of asymmetric lead-lag relationships has been 
initially reported for the U.S. stock market [TJ. Specifi- 
cally, it was found that returns of large stocks lead those 
of smaller ones. Later, trading volume was identified as 
a significant determinant of such lead-lag patterns, and 
returns of high-volume stocks (portfolios) were found to 
lead those of low- volume stocks (portfolios) . These 
lead-lag effects have primarily been explained by differ- 
ent effects of information adjustment asymmetry. For 
instance, a model was brought forward in where it 
was argued, that, as soon as previous price changes are 
observed and marketwide information can thus be incor- 
porated in the marketmakers' evaluation of stock prices, 
lagged correlations may emanate. Another type of infor- 
mation asymmetry can be seen in the different number 
of investment analysts following a firm's stock price [20l |. 
Other explanatory approaches, include the institutional 
ownership of stocks , the different exposure of stocks 
to persistent factors |22j| , or transaction costs and market 
microstructure as causes of lagged autocorrelations. 
Whether or not non-synchronous trading may constitute 
a source of lead-lag relationships or not is an issue of on- 
going discussion |2j, |25| . Recently, aiming at a closer 
empirical understanding of lagged correlations, the de- 



pendence of the strength of lagged correlations on the 
chosen time-shift t has been analyzed for high-frequency 
NYSE data 12^. It was shown, that the lagged corre- 
lation function typically exhibits an asymmetric peak. 
The revealed patterns basically showed structures con- 
sistent with those found in l[L7j (e.g. patterns where 
more 'important' companies pull smaller, less 'important' 
ones). Interestingly, also evidence for a diminution of the 
Epps effect |23 has been demonstrated based on lagged 
cross-correlations of NYSE-data, as lead-lag dependen- 
cies seem to diminish over the years p^ . 

As diverse, interesting and as on-going these ap- 
proaches are, the methods applied are mainly based 
on Granger causality, vector autoregressive models and 
shrinkage estimators. In this paper, we want to extend 
the methodology to eigenvalue analysis of time-lagged 
correlations. First, we discuss how solutions of RMT 
problems pertaining to real, asymmetric matrices can be 
obtained from solutions to the symmetrized problem via 
an inverse Abel-transform. The respective developments 
will then enable us to derive the form of the eigenvalue 
spectra of the pure random case. As an immediate ap- 
plication we compare these theoretical results, with real 
financial data and relate the observed deviations to mar- 
ket specific features. 

The paper is organized as follows: In Section |2| we fix 
the notation and develop the spectral form of asymmetric 
real random correlation matrices. In Section Owe apply 
the introduced methodology to empirical correlation ma- 
trices of 5 min log-returns of the S&P500 and discuss the 
meaning of deviant eigenvalues from several perspectives. 
Time-dependence issues are discussed in Section0]and in 
Section Owe finally conclude. 



2. SPECTRA OF TIME-LAGGED 
CORRELATION MATRICES 

2.1. Notation 

The entries in the NxT data matrices X for N assets 
and T observation times, are the log-return time-series 
of asset i at observation times t, 

rl^\nSl-\nSl_, , (2) 

after subtraction of the mean and normalization to unit 
variance, i.e. division by di = \/ {{r\Y) — {r\)'^ . Here, 
SI is the price of asset i at time t. One time unit is the 
time difference between observations dX t + \ and t, e.g. 
a day, 5 minutes; for tic data it can also be of variable 
size. Time-lagged correlation functions of unit-variance 
log-return series among stocks are defined as 

Cl^{T)^{{r\-{r\)){rl_,-{rl_,)))^ , (3) 

where the time-lag r is measured in time units and (...)t 
stands for a time-average over the period T. We drop (T) 
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in the following, except for Section^ Equal-time corre- 
lations are obviously obtained for t — 0. For r ^ 0, the 
lagged correlation matrix Ct- is generally not symmetric 
and contains the lagged autocorrelations in the diagonal. 
It can be written as 

a = lxD,XT , (4) 

where Dt- = St^t+r and where X is the N xT normalized 
time-series data. Denoting the eigenvalues of C*-' by 
and their associated eigenvectors by Ui (or Uik), where 
i, k — 1, N , we may write the eigenvalue problem as 

Y,0u,^X,u, . (5) 

3 

We immediately recognize that eigenvalues A; are either 
real or complex conjugate, since the matrix elements of 
C^^ are real and thus the conjugate eigenvalue A* also 
solves Eq. (O . Regarding the elements of C*-' as random 
variables with a certain distribution, we should keep in 
mind that their specific construction, Eq. results in a 
departure from a 'purely' random real asymmetric N x N 
matrix where the entries are iid Gaussian distributed. 
Thus we do - in general - not expect a flat eigenvalue 
distribution as in the Ginibre-Girko case. Rather, we 
can interpret Cr as a random real asymmetric matrix 
with a special structure due to its construction. In gen- 
eral, comparably little work has been done to understand 
the eigenvalue spectra of such random real asymmetric 
matrices. Unfortunately, powerful addition formalisms 
developed for non-Hermitian random matrices (see e.g. 
[2^ and references therein) are not applicable in the case 
of random real asymmetric matrices. However, it was 
shown that the problem can be treated in a way for- 
mally equivalent to classical electrostatics 13, "3^ and a 
generalization of Girko's semicircular law ^] could be 
recovered via application of the replica-technique. 

2.2. General Arguments 

We start our arguments from the electrostatic poten- 
tial analogy, originally introduced by Wigner. The idea 
is to interpret the distribution of eigenvalues in the com- 
plex plane as a distribution of electrical charges in 2 di- 
mensions. Following the same arguments as in [30l |. the 
corresponding potential in 2 dimensions is given by 

c^{x,y) = -^(Indet {{S,,z* ~ Cf )(<5,,z - C^))), , 

(6) 

where z — x + iy^ and (...)c denotes the average over the 
distribution, 

P(X) ^ exp (^-^Tr(XT)) , (7) 

of the matrices X^- . It can be shown [33| that Eq. © 
allows for the calculation of a density p(z) = p(x^ y) via 



the Poisson equation 

p{x,y) ^ -^Acl){x,y) . (8) 

Expanding the argument of the determinant in Eq. © 
we obtain the positive definite matrix 

H,j^6,,\z\+Cl^C^'~x{0+C^')+ty{Cl^-Ci') . (9) 

This form shows that any symmetric (anti-symmetric) 
contribution of CJ;P only influences the real (imaginary) 
part of z. 

If there is no structural difference in the randomness 
of the symmetric and the anti-symmetric part of matrix 
Cr, the expression of Eq. ij^l is equivalent under ex- 
change of X and y in the distribution sense, and Eq. (jHJ 
will thus be a symmetric function in x and y. Since we 
do not expect any direction in the complex plane being 
distinguished from any other in the limit N 00, we 
conceive that the eigenvalue density resulting from (0 is 
a radial symmetric function, i.e., 

p{x,y)^p{r) = ^ f dzp(z)S{\z\~r) . (10) 

A more formal argument can be given via expanding the 
matrix Hij entering the potential cj) |3lj | . Since the entries 
in dj are typically smaller than one, Hij can be written 
as Hij « I^K^l -I- eB). Here, e is a small perturbation, 
A = S,j and B = C^' /\z\-x{C'^ -C^')+iy{C'^ +C^') 
with X = x/\z\ and y = y/l^l. We fix \z\ = 1 without 
loss of generality and write the determinant as a Taylor 
series, 

<l>{x.y) - -^(lndet(i/,,)>, = -l(Trln(i/,,)>c 

« -l(Tr(i?)-Tr(^) + Tr(|^) -...), . 

(11) 

Based on this series, we checked up to fourth order 
that this expansion indeed only leads to terms in r for 
iV — > 00; we outline some aspects of the calculation in 
Appendix A. We note that yet a different and probably 
even more powerful way of proving our conjecture would 
be to replace the determinant in Eq. © by Gaussian 
integrals and use the replica method to average over the 
distribution of the Cy . 

If p{r) is circular symmetric, the support S of the 
eigenvalue-spectrum will be bounded by a circle and is 
thus definable via a maximal radius Vmax- Since r^ax is 
governed by the standard deviation of the underlying ran- 
dom matrix elements, one can compute the extent of the 
support of Cr by considering the support of symmetric 
{'''max) and anti-symmetric matrices {r:^ax)- L'^t these be 
defined by = ^{Cr + Cj) and C^ = ^{0^-0 J). 
If we assume that the standard deviations of the sym- 
metric and anti-symmetric matrices are equal, as — (ta. 
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freedom for and C:^, but two degrees of freedom for 
Cr, hence leading to cr = •\/2(ts/2 instead of V^as- 

Based on these relations and regarding the discussion 
of Eq. lO, it is sensible to conjecture that the projections 
of p{r) onto the x-axis, denoted by ^^^(A), and the projec- 
tion onto the y-axis, Py{X), are nothing but the rescaled 
spectra of the solution to the symmetric, p^{X), and to 
the anti-symmetric problem, p^{y). To be more explicit. 
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p,(A) = p(Re(A)) = / p{r)dy = p^(V2x) 

•^f , (13) 

Py{\) ^ p(Im(A)) = p{r)dx = /(V2y) 

where the integration extends over the support S in the 
complex plane. Although this conjecture might seem 
quite natural we shall provide numerical evidence for its 
correctness below. 

First, we note that the eigenvalue density of the sym- 
metric problem can be obtained from the well-known re- 
lation 

pS(x) ^\^5(x- Xn) = - lim rim(G^(a; - ie))] . 

^ — ' TT e^O '- 

n 

(14) 

For a radial symmetric problem, of course, p^ ^ p^. The 
main idea of this work is now to note that one can use 
the following technique to actually determine the radial 
symmetric density p{r): 

Since the rescaled eigenvalue density of the sym- 
metrized problem p^ {\/2x) is nothing but the projection 
of p{r) onto the real axis, Eq. (|13|l . it can be written as 
the Abel-transform yjl, 



FIG. 1: Complex eigenvalue spectra of time-lagged correla- 
tion matrices, obtained from random matrices X. The entries 
of X are iid and Gaussian with unit variance. In (a), (c), (e) 
and (g) the position of the eigenvalues is shown in the complex 
plane for values of Q = ^ = 100, 10, 1 and 0.5, respectively. 
The visibly enhanced density along the real axis is the finite- 
size effect mentioned in the text. The right column shows 
the projections of the EVs onto the real and imaginary axis. 
The solid lines are the theoretically expected curves, which 
are numerical solutions to Eq. (1181 . Note in (h) that for 
this projection, the eigenvalue spectra is composed of differ- 
ent solutions to Eq. 1181 as G{z) itself has a discontinuity. 
The divergence at z = is not shown for analytical curves 
associated with Q = 100, 10 and 0.5. 



this implies that the standard deviation a of the matrix 
C^^ , will be cr = ^/2(Ts/2. Thus, the support of C,- can 
be defined via a disc with radius 



1 
T2 



1 
T2 



[12) 



The argument here is that the eigenvalue-density can be 
regarded as a log-gas [s^ which has only one degree of 



'(\/2x) = 2 / 



p{r)r 



zdr 



(15) 



of the radial density p{r). One can then reconstruct the 
desired eigenvalue spectrum exactly (in the limit N — > 
oo) via the inverse Abel-transform, and thus via the cuts 
of the Greens function of the symmetric problem. 



p{r) = 

TT 



1 p £ lim,^o [lm(Gf {V2x ~ ie))] 



da; 



(16) 

Here, we have made use of Eq. ((T^ . Since Eq. lfT?))l can 
be problematic if evaluated numerically, we also spe cify 
a form which exploits the Fourier-Hankel-Abel cycle |3J| 



p(r) 



27r / q3o{2nrq) / p'^ {x)e-^''''"'dx dq 

Jo J -co 



(17) 

where Jo(a;) denotes the zeroth-order Besselfunction. 
We also note, that yet another method of determining 
p{r) is the evaluation of the inverse Radon-transform of 
P^(V2x). 
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Equation (|16() applies for any radial symmetric eigen- 
value density in the limit N —^ oo and allows for a cal- 
culation of the eigenvalue density in the complex plane 
via a method of exact reconstruction based on the eigen- 
value density of the symmetrized (or anti-symmetrized) 
problem. Typically, the solution of the symmetric prob- 
lem will be valid only in the N oo limit. Thus, al- 
though the Abel- inversion gives an exact result, discrep- 
ancies may occur because of finite-size effects. Before 
turning to the specific problem of lagged correlation ma- 
trices we refer to Appendix B, where - as a specific and 
prominent example - we show the almost trivial case of 
deriving the density of real asymmetric random matrices 
(without 'imposed structure') [23| directly from Wigner's 
celebrated semicircle law. 



2.3. Application to lagged correlation matrices 

We now turn to our specific problem of determining 
the eigenvalue density of Cr- What is left is to confirm 
the validity of our conjecture, Eq. H13|l . and to show, that 
- as a consequence - Eq. gives an approximation to 
the radial eigenvalue distribution, p{r). To start, we can 
refer to existing literature on the symmetric problem: It 
has been shown p^. IT^. that the Greens function, G{z) 
of the symmetric problem C^-'^ = 2^X(Dt- -|- D_^)X-'" is 
given by 

^(z2-(i-l)2)G2(z) (18) 



-2(i-l)zG(z) 



2 - J- 
^ Q 



with Q = T/N playing the role of a information-to-noise 
ratio. Note, that this equation is independent of a specific 
value for r and is valid for any value of it |l6j . 

We note, that - in a calculation analogous to the one in 
[T^ - it is easy to show that the Greens function pertain- 
ing to the asymmetric problem follows exactly the same 
equation, which reaffirms circular symmetry. Based on 
Eq. (|18|l one can calculate Px{^) by using Eqs. (|14|l and 

Figurenshows (simulated) spectra of Cr=i as defined 
by Eq. Q with iid entries in the columns of X, for var- 
ious values of Q. Note, that for Q < 1 the shape of the 
boundary of eigenvalues in the complex plane changes 
from a disk to an annulus (see e.g. [ssl ] for a discus- 
sion of disc-annulus phase transition in the case of non- 
hermitian matrix models). We immediately recognize 
that eigenvalues are enhanced along the real axis and 
that, as a consequence, the density is lower in the vicin- 
ity of the real axis. This can be attributed to a well- 
known finite-size effect, already discussed in floll30j| . Of 
course, this effect implies that circular symmetry is not 
fully fulfilled for finite matrices of the GinOE. Thus, we 
also expect to observe some discrepancies between the 
theoretical results based on the Abel-transform and the 
empirical densities of finite, lagged correlation matrices 
based on random data. 
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FIG. 2: Radial eigenvalue densities approximated via simula- 
tions along different directions (real axis, imaginary axis and 
the diagonal in the complex plane. Numerical data for finite 
matrices is compared with the solution of the inverse Abel- 
transform, (a) Q=100 (b) Q=10 (c) Q=l; the inset shows a 
detail of the curve. 



In our concrete case, the prediction of the projections 
Px and py (blue lines, obtained from Eq. (|14|l and Eq. 
(^HJ) depicted in the right column of Figure ^ is in good 
agreement with the numerical data for the real parts of 
the eigenvalues {px}- For the projection of the complex 
parts (py) we recognize that there is a slight deviation 
from the prediction (due to the enhanced density along 
the real axis). We also checked projections with data ob- 
tained via rotating all the individual eigenvalues in the 
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complex plane for different angles. Apart from some mi- 
nor effects attributable to the inhomogenity around the 
real axis we found no significant discrepancies. We also 
note that the simulated data did not show any significant 
discrepancies when taking different values of r which is 
again in agreement with the theoretical anticipation. 

Turning towards the point of reconstructing the ra- 
dial eigenvalue density, the function to be transformed 
{p^{^/2x) or p^{^/2y)) may be evaluated exactly (with 
some effort) for the symmetric case from Eq. (|14|l and 
Eq. 118|l . The remaining integral Eq. (|16|l will, however, 
be hard to solve in general. Nonetheless, we are able to 
solve the case Q = 1 analytically and obtain the exact 
formula for the eigenvalue density, 

(19) 

with K = 6\/ TT^r^ . Here, r(a;) denotes the Gamma func- 
tion and (^\{a,b^c, z) the hypergeometric function; the 
derivation is briefly summarized in Appendix C. Note 
that limQ_>o G'q(^) — i, whereas for Q — > oo we ex- 
pect the Greens function and the eigenvalue density to 
converge to those of a random real asymmetric matrix 
without specific structure, i.e. a flat eigenvalue-density 
in the sense of |30| . 

We were not able to derive closed expressions for other 
values of Q, since already the solution of Eq. H18|l results 
in lengthy expressions. In these cases we computed the 
integral Eq. (|16|l numerically. The results are depicted 
in FigureElfor Q ^ 100, Q = 10 and Q = 1. The theoret- 
ical predictions are accompanied by data obtained from 
performing cuts along various directions of the spectra 
p{x, y) from Fig. ^ namely along the x-axis, the y-axis 
and along the diagonal direction, i.e. Re(A) = Im(A). We 
performed these cuts numerically via calculating the den- 
sity in narrow strips along the different directions. The 
theoretical prediction catches the different experimental 
densities very well. Especially for Q = 100 and Q = 1 re- 
sults are consistent with the predictions to a high degree. 
For Q = 10 we observe some discrepancies for values 
r < 0.1. We think that these are very probably asso- 
ciated with the finite-size effect of enhanced eigenvalue 
density along the real axis discussed above. Actually, a 
closer investigation of this effect and a comparison with 
the solution found in would be interesting to do but 
remains outside the scope of the present work. 



3.1. Data 

We analyze 5 min data of the S&PSOO in the time pe- 
riod of Jan 2 2002 - Apr 20 2004. The time-series were 
cleaned, corrected for splits and synchronized. In par- 
ticular, days where trading took only place in 'limited' 
form ('half-days' etc.) have been removed (this includes 
the dates Sep 11 2002, Dec 26 2003, Jan 19 2004, Feb 
16 2004). Additionally, all assets in which more that 
1.5% of data were missing and/or assets which were not 
quoted over the full time-frame have been removed. Af- 
ter cleaning, the data set X consisted of iV = 400 time- 
series at T = 44720 observation times each. The em- 
pirical time-series and its distribution-functions showed 
the usual 'stylized facts' of high-frequency stock-returns 
(fat-tails, clustered volatility, etc.). Of course, also the 
well-known structure of correlation matrix element dis- 
tribution at equal times was found to be present in the 
data (not shown). For the remainder of the paper, we 
fix r = 1, i.e. a five minute shift, and T = 44720, if not 
stated otherwise. From X we construct two surrogate 
data sets, one by removing the market mode, the other 
by a scrambling of data. As t = 1 remains unchanged 
during the rest of the paper, we will occasionally drop 
the subscript, Ci = C. 



3.1.1. Market mode removed data 

It is well known that the spectrum of equal-time cor- 
relations is dominated by a single very large eigenvalue 
which can be attributed to the so-called 'market-mode', 
see e.g. 0, 0, 113 • Removing the 'market mode' is 
thus approximately equivalent to removing the move- 
ment of the 'index' of a given universe from the individ- 
ual assets. We define the market return (the index) by 
— Sj=i "^^ij'^t I where vij is the eigenvector associated 
with the largest eigenvalue Ai of the empirical covariance 
matrix at equal times, i.e. r = 0. To remove this market 
mode from the data we simply regress in the spirit of the 
CAPM 

rj = a^ + /3Vr + eJ , (20) 

where the residuals e\ carry what is left of the structural 
information in the data; we denote this data set by X""**** , 
its elements being Xj^^ — e\. 



3.1.2. Scrambled data 



3. EMPIRICAL ANALYSIS 

With a theoretical concept of, and some specific knowl- 
edge about, the eigenvalue-spectra of time-lagged corre- 
lation matrices, we now turn to actual financial data and 
study empirical lagged correlation matrices Ct-. 



A scrambled version X**""" is generated by a random 
permutation of all elements of X. This destroys all corre- 
lation structure but has exactly the same distributions as 
the original data. Correlation matrices from X^"'' should 
- up to potential non-Gaussian effects in the distribu- 
tions - correspond to the developments in Section |21 We 
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FIG. 3; (a) Empirical distribution P{Cij) of the lagged corre- 
lation matrix elements dj for a sampling period of T = 40000 
and T — 4000 (inset). Circles represent empirical data, red 
squares the situation for scrambled data from X'^'^''. (b) shows 
the same for the removed market case, i.e. from X'^*^. Indi- 
vidual frequencies are normalized by the summed frequencies 
for each plot. 



checked that the support of the eigenvalue-spectra per- 
taining to the lagged correlation matrices - which will be 
the quantity used for identifying deviating eigenvalues - 
indeed resembles the value r^ax of the Gaussian case dis- 
cussed in Section |21 A treatment of the exact spectra of 
lagged correlation matrices of random Levy distributed 
data (see e.g. 0,|33,|39| for the case of equal-time covari- 
ance matrices) is beyond the scope of the present work. 



3.2. Empirical time-lagged financial random 
matrices 

In Fig. 13 we show the distribution of matrix elements 
P{Ci) (circles) of the empirical correlation matrix Ci, 
based on X (a), and X.'^^^ (b). Squares show the results 
for the scrambled data C|°"". The inset shows the result 
for a shorter sampling time of T = 4000. Clearly, there 
is 'significant' correlation in the data in both cases, con- 
trasting the Gaussian prediction of the efficient market 
hypothesis. The effect of varying the time-difference as- 
pect of lagged correlations has been carefully studied in 



|26|. and we shall not discuss this issue here. However 
we point out, that - as expected - the lagged correla- 
tions at r = 1 were larger than for values of r > 1 , which 
is fully conforming with the findings of 26]. We also 
mention that we see that correlations typically decrease 
with decreasing observation frequency (comparing 5 min 
data with hourly returns) , but still remain well above the 
scrambled case (not shown). 

The situation for the market removed data X'"''^, (Fig. 
01 (b), shows that lagged correlations are not distributed 
according to the efficient market hypothesis as well. The 
frequency of higher values of is slightly reduced and 
the curve has significantly changed shape. In the semilog- 
arithmic plot of Fig. |31 the positive regime is clearly not 
following a square-polynomial curvature, but rather an 
exponential one. This also applies to the data sampled 
from T = 4000 subperiods, depicted in the inset of Fig. 
13 Both empirical distribution functions also exhibit clear 
non-random negative autocorrelations which are the pre- 
dominant source of the non-Gaussian tails for negative 
entries. 



3.2.1. Eigenvalue spectra 

We now proceed to the analysis of empirical eigenvalue 
spectra of the financial data. Figure^ (a)- (c) shows the 
eigenvalue spectrum obtained from C at various stages. 
In Fig. ^ (a) a few very strong deviations from the 
bulk of the eigenvalues are seen, most significantly one 
real eigenvalue Ai « 4.6 and a conjugate pair of com- 
plex eigenvalues. Fig. 0] (b) is a detail of (a) where a 
clear shift of the bulk of the eigenvalues with respect 
to the Gaussian regime (circle) is observed. This shift 
can be attributed to two effects: First, each deviating 
positive real eigenvalue A.^ is associated with a shift s 
of the 'bulk' spectrum of s « — Re(Ai)/7V in direction 
of the negative real axis. ('Departing' eigenvalues are 
those which have real parts larger than the radius of the 
theoretical support.) The shift of the 'disc' pertaining 
to this effect is then the sum of all effects from depart- 
ing eigenvalues, stot = -jfY.x^^'^i^i) ~ - 0.031. A 
second contribution of the shift is due to the non-zero 
diagonal entries of the correlation matrices Ci. The 
shift of the center of the disk explainable by the mean 
of the diagonal elements is C" = —0.029, such that 
the overall displacement is d = Stot + Cu = —0.060. 
When corrected for the total shift we arrive at Fig. 0] 
(c). We repeated the same procedure for C'['^^, getting 



Stot 



-0.020- 0.061 = 0.081; the resulting 



displacement corrected distribution is depicted in Fig. ^ 
(d). The shift of the center of the support is thus quite 
simply explained. 

The eigenvalues lying outside the random regime 
should now be clearly associated with specific non- 
random structures which will be examined below. For 
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FIG. 4: Eigenvalue spectra of lagged correlation matrices from 
5 min S&P500 data, (a) shows the full spectrum with one very 
large deviation on the real axis (Ai ~ 4.6), and a large depart- 
ing eigenvalue pair A2 = A3, (b) is a detail, clearly showing 
that the spectrum is shifted with respect to the 'bulk-disc', 
(c) spectrum corrected for displacement d as discussed in the 
text, (d) is the eigenvalue spectrum based on the market re- 
moved data, X'''^'^, also after displacement correction. The 
circles in plots (b)-(d) indicate the theoretical support dis- 
cussed in Section |5| 



the eigenvalues within the circle - i.e. for the eigenvalues 
within the regime of Gaussian randomness ~ the natu- 
ral expectation would be that these follow the Gaussian 
predictions developed in Section |2] 

In Fig. [SI we compare predictions from Section [21 with 
the empirical data , showing projections of empirical 
eigenvalue data onto the real and imaginary axis. The 
inset shows the theoretical prediction of the radial den- 
sity integrated over the complex plane, 2r7rp(r), com- 
pared with the empirical data, p(|A|). We chose a 'ac- 
cumulated' representation since data quality would be 
unsatisfying otherwise. The empirical spectra are trun- 
cated at Re(A) = 1. Given the modest eigenvalue statis- 
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FIG. 5: Projection of the empirical spectrum pertaining to 
Fig. 2t on the real and imaginary axis. The blue line is the 
analytical solution discussed in Section|5| The inset shows the 
empirical distribution of p(lA|) compared with the analytical 
analogue 2rnp{r). 



tics {N\ — 400) and the strong deviations outside the 
theoretical support, the agreement between the theoreti- 
cal predictions for Gaussian noise and the empirical data 
seems rather satisfying. 



3.2.2. Interpretation of deviating eigenvalues 

Strong deviations from the theoretical pure random 
prediction indicate significant correlation structure in the 
data. It is intuitively clear that eigenvalues departing 
positively (negatively) on the real axis with no or only 
a small imaginary part will be the effect of symmetric 
(anti-) correlations. On the other hand, complex conju- 
gate eigenvalues departing on the imaginary axis will be 
attributable to asymmetric, non-Gaussian correlations. 

Thus, the departures of the largest eigenvalue in Fig. 
0] (a) and (c) should be caused by a lagged correlation 
structure either pertaining to a group of stocks or to all 
of the stocks. On the other hand, we also see significant 
non-symmetric correlations in X reflected in complex- 
conjugate pairs of eigenvalues with relatively large imag- 
inary parts. The residuals X'"''^ show a large negative 
real eigenvalue indicating approximately symmetric anti- 
correlations between stocks. Such a departure is not vis- 
ible for X. 

For a closer inspection of which assets 'participate' in 
a given eigenvector belonging to a deviating eigenvalue, 
one usually defines the inverse participation ratio for the 
eigenvectors Mi, 



N 



lPR{u^) = 



u^e\ 



(21) 
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FIG. 6: (a) Inverse participation ratio as defined in Eq. H21|l 
as a function of the absolute value of Xi. Circles represent 
data from the empirical matrix, squares (inset) data from a 
random analogue, obtained from iid gaussian distributed X. 
(b) The same as above but for eigenvectors obtained from the 
data with the market mode subtracted out. 



This ratio shows to which extent each of the N = 400 
assets contribute to the eigenvector Ui. While a low IPR 
means that assets contribute equally, a large IPR signals 
that only a few assets dominate the eigenvector. 

Figure ini(a) shows the IPRs for the empirical correla- 
tion matrix Ci. The inset is a detail and also exhibits 
the IPRs from scrambled data (squares). It appears, that 
the 'random' regime is not confined to an approximately 
constant region of IPRs but varies quite widely. This is in 
contrast to the symmetric case where one has a constant 
IPR for eigenvalues stemming from Gaussian random- 
ness. We checked that the fluctuations observed here are 
already present in the Ginibre ensemble of real random 
asymmetric matrices and are thus not associated to the 
specific structure of C^. It is clear, that the IPRs be- 
longing to the random case not being bound to a line 
hinders the identification of the eigenvectors with strong 
influence of only a few components to a certain extent. 
However, one can nonetheless see that the largest depart- 
ing eigenvalue Ai is characterized by a rather small IPR, 
indicating an influence of a large number of assets. In 
contrast, some other deviant eigenvalues lie well above 



Energy 


10 


22 


Materials 


15 


27 


Industrials 


20 


44 


Consumer Discretionary 


25 


63 


Consumer Staples 


30 


35 


Healthcare 


35 


40 


Financials 


40 


71 


Information Technology 


45 


63 


Telecommunication 


50 


11 


Utilities 


55 


24 



TABLE I: Global Industry Classification Standard (GIGS 
code), for the 10 main sectors of the S&P500 with the number 
of stocks in these sectors, see www.standardandpoors.com. 



the random regime indicating the influence of only few 
stocks. 

Again, we compare with the situation found for the 
residuals X""**** which is given in Fig. EKb). On average, 
the IPRs of the deviating eigenvalues are larger than in 
(a), indicating a more clustered structure. We further 
analyzed and IPR-like quantity only based on the imag- 
inary parts, IPR(Im(Mi)) — X^fci ^"^^ found 
pure random behavior, except for A2 = A3 (not shown). 
With evidence at hand for some group structure in the 
lagged-correlations, we now take a closer look at these 
structures. 



3.2.3. Sector organization in time-lagged data 

It is well known from RMT applications to covariance 
matrices (r = 0) of financial data, that the eigenvectors 
Ui of large eigenvalues can be associated with the sector 
organization of markets. Let us label the different sectors 
with s, and define 



A,fc = 



1 if stock k belongs to sector s 
otherwise 



(22) 



To visualize the influence of each sector s to a given eigen- 
vector z, we calculate 



1 ^ 

— Va 



sk I'^ik 



(23) 



k=l 



where Ng is the number of stocks in the respective sec- 
tor, s. We evaluate Eq. ^ for the S&P500, using the 
standard sector classification scheme, the so-called GIGS 
code, which is summarized in TablcQ] Figure[71shows the 
contributions of the sectors to a set of selected eigenval- 
ues for the original (left column) and the market-mode 
removed data (right column). In the case of the original 
data, the information technology sector seems to play 
a decisive role for the largest 3 eigenvalues, namely Ai 
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FIG. 7: Strength of participation, I si, of the ten main sectors 
of the S&PSOO (according to the GIGS code) to eigenvectors 
Ui for some selected eigenvalues Ai. 



and A2 = A3. This sector thus explains a large part of 
the most distinctive non-random (symmetric and asym- 
metric) structure in Ci. For other eigenvectors, as for 
example A4 and Aio and others not shown here, a dis- 
tinctive role is played by the energy and financial sector, 
respectively. 



Results for C^^''^ (right column) also show remarkable de- 
viations from the Gaussian efhcient market prediction 
(equal contribution of the individual sectors). Here, the 
largest eigenvalue X\^^ is associated with a strong partic- 
ipation of the energy and utility sectors. In the second 
eigenvector, the financial sector is dominant, whereas the 
eigenvalue associated with the strong negative departure 
on the real axis, A3 « — 1, is not dominantly influenced 
by any sector. For A4 = A5 we find a strong influence 
of the energy sector. Other eigenvectors also indicate a 



strong sectorial contribution (not shown). 

For a quantitative discussion of the structure imposed 
by the individual eigenvectors and eigenvalues we decom- 
posed the (square) correlation matrices with respect to 
individual eigenvalues. 



Ca, = Uydiag(Ai)M, 



(24) 



where diag(Ai) denotes a diagonal matrix with only one 
entry at the respective position, associated with eigen- 
value Xi. In Fig. |S| (a) we display histograms of the 
elements of Ci'*'" in the same way as in Fig. (|3Jl. The 
largest contribution to Ci is seen to originate from Ai, 
and tails seem to follow a distinctive exponential distribu- 
tion. Thus, the structure associated with Ai is definitely 
not Gaussian and exhibits specific (exponential) behavior 
which is not visible in the distributions of the elements 
of the full matrix Ci. The complex pair A2 = A3 car- 
ries predominantly negative correlations. The following 
eigenvalues contribute much less. The 'humps' in the 
histograms, e.g. seen for A2 = A3 and A4, indicate some 
deterministic structure. In Fig. |S1 (b) the same is shown 
for the market removed data. The positive tails of the 
distribution of the entries of Ci^^ strongly deviate from 
the Gaussian regime. This 'hump' can be understood as 
a consequence of strong correlations of sectors 10 and 55, 
seen in Fig. [7| This effect is also visible in a network vi- 
sualization of the market removed matrix. We will now 
proceed to such a network view to visualize and further 
discuss the findings of strong sectorial contribution and 
strongly anomalous distributions C^^ . 



3.2.4- Lead-lag networks 

Comparing eigenvalue spectra of the residuals with 
those of the initial data (Figure it is apparent that 
the market mode has a clear influence on the deviations 
and that the largest eigenvalue for the residuals is signif- 
icantly reduced. As a matter of fact, one would expect 
that removing the (equal-time) market-mode also elimi- 
nates much of the correlations pertaining to small firms 
driven by large companies or similar 'star-like' structures 
(i.e. any network structure where one stock leads or lags 
many other stocks). In Fig. |51 (a) we show a network 
view of the Ci correlation matrix, where a link is drawn 
for any > 0.09; (b) is the same after removal of the 
market mode, and Cj"* > 0.033. Clearly, while in (a) 
there is not much clustering (except maybe for the utility 
sector), in the market removed scenario distinctive clus- 
tering appears. As in the previous section, we identified 
the nodes with the 10 most important sectors in the mar- 
ket. Nodes are colored according to these sectors in Fig. 
El along the lines of the accompanying color scheme. The 
identified clusters correspond very nicely with industry 
sectors, as was found quite some time ago for r = 0. 

Returning to an analysis of the original data, we look 
at networks derived from individual matrices C^' , Eq. 
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FIG. 8: Histograms of entries in Ci'*'» for several strongly 
deviating eigenvalues, for original (a), and market-mode re- 
moved data (b). 

(j^ . to visualize some 'qualitative structure' associated 
with strongly deviant eigenvalues and thus associated to 
the most 'orthogonal' aspects of overall-deviations. 

For the largest eigenvalue Ai, we investigate a few as- 
sets from the Information Technology (IT) sector lead- 
ing stocks of different sectors (not shown) with positive 
lagged correlations. The most pronounced hubs from IT 
were found to be AMAT, BRCM, INTC, KLAC, LLTC, 
MSFT, MXIM, NVLS, YHOO and XLNX. Quite simi- 
larly, the most prominent features of the conjugate pair 
A2 = A3 can be associated with a hub-like influence of 
the IT sector - this time, however, with a negative lagged 
correlation. Networks pertaining to A4 and Aio primarily 
exhibited intersectorial ties of the Energy and Financial 
sector, where we also observed hub-like anti-correlations 
pointing from stocks of the Financial sector to the Energy 
sector. 

For the lagged correlation matrix of the residuals X'''^^, 
the largest eigenvalue A™** shows a strong clustering of 



Energy & Utility sector, which is shown in Fig. ini(c). The 
fact that practically no assets apart from the Energy and 
Utilities sector are represented is fully conforming with 
the top right panel of Fig. [7| The tight binding of these 
sectors is also seen in Figs. 111(c), and|Sl(b). In the lat- 
ter, the strong tail corresponding to positive correlations 
of A^'"' seems to be a consequence of this binding. The 
second largest eigenvalue, A2°*', demonstrates organiza- 
tion of the Financial sector where some stocks - namely 
BAG (Bank of America), FITB (Fifth Third Bank) and 
C (Citigroup Inc.) - dominate the others (not shown). 
Closer inspection of the negative eigenvalue Re(A3) — 1 
reveals, that it is mostly associated with time-lagged anti- 
correlations between various sectors; eigenvalue A4 — A5 
exhibits clustering of the Energy and the Consumer Sta- 
ples sector. 

In general, the analysis of the residuals effectively re- 
veal secondary information not seen before, which is 
mainly attributable to the sectorization of stocks. In- 
ferring from causes to effects, this fact may explain in 
part or all of the well investigated equal-time cross- 
correlations, see e.g. Q for a short description of an 
adequate model. In contrast to the residuals, the origi- 
nal data exhibits lots of hub-like interactions, where the 
assets lagging the hubs do not seem to belong to a spe- 
cific sector. The most pronounced leading hubs are stocks 
from the IT sector which has apparently 'lead' the mar- 
ket within an observed time-period. As a side comment, 
it does not seem to us that the associated leading stocks 
were the ones with the highest market capitalization as 
would be implied by the finding of |17|. 



4. TIME DEPENDENCE 

In this section we discuss the time-dependence of the 
correlation matrices. We can immediately use the pre- 
diction of the support of the eigenvalue spectra in the 
complex plane C to determine a minimum sampling pe- 
riod T (or equivalently a minimum value of Q) at which 
the estimated cross-correlations still exhibit non-random 
structure. This is possible since we know that if eigenval- 
ues are outside the support the data is non-random. Re- 
ducing T too much one expects to arrive a very noisy esti- 
mate of the lagged correlation matrix, which will manifest 
itself in having no departing eigenvalues at all. 

We calculate Ci(Ti) for consecutive, non-overlapping 
time periods Ti and find that - very remarkably - down to 
a information to noise ratio of Q « 1.25, clear deviations 
from the predicted support occur. This means that even 
though noise is drastically increased for low values of Q, 
non-random structures prevail even at short time-scales. 

More specifically, we analyzed 11 correlation matrices 
obtained from time slices of 4000 observations (Q=10), 
and 89 matrices for 500 time points each. For each in- 
dividual sub-period r„, we compute lagged correlation 
matrices C(T„) for the raw data as well as on the matri- 
ces resulting from the regression model, C'^^^{Tn). Figure 
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FIG. 9: Network view of Ci (a). A Link was drawn for > 
0.09. The situation for the regressed scenario is shown in (b) 
with a threshold of Cj"' "^"^^ > 0.033. (c) Shows the correlation 
network for stocks belonging to the largest eigenvalue in the 
regressed data (for CJ-*^^ > 0.13). Two sectors (Energy and 
Utilities) are tightly bound together. All network pictures are 
results from a Kamada-Kawai algorithm. 



El(a) shows a plot of the absolute value, abs(A„), of the 
maximal eigenvalue found for each sub-period, indexed 
by n. The dashed blue line corresponds to the predic- 
tion of the support r^ax ■ We immediately recognize that 
for Q = 10, as well as for Q = 1.25 the largest eigen- 
value lies significantly above the noise regime. On the 
other hand, the absolute value of the largest eigenvalue 
is quite volatile and anti-persistent for Q = 1.25. We 
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FIG. 10: (a) Time dependence of the largest eigenvalue of 
Ci(T„) as a function of the period index n for T = 500 
(main figure) and T = 4000 (inset). Values are plotted as 
blue circles if the largest eigenvalue is located on the real axis 
{Vm.(}^°'^) = 0) and as red squares otherwise, (b) Same for 



also observe that the largest eigenvalues with non-zero 
imaginary parts (red squares) mainly occur at low values 
of abs(A„), whereas real eigenvalues occur at absolute 
values. If the eigenvalue is real, the lead-lag network is 
dominated by strong, approximately symmetric effects; 
for imaginary eigenvalues the network is dominated by 
asymmetric correlations, i.e. anti-correlations may play 
a distinctive part too. We find that if an eigenvalue Ai 
was real (i.e. marked by a blue circle in Fig. I10|l . the 
analysis of the preceding sections always identified the 
IT sector mainly contributing to u\ (for Q = 10). On 
the other hand, if the largest eigenvalue was imaginary, 
no unique interpretation appeared to be valid for all of 
the sub-periods. 

In Fig. 1101 (b) we show the same for our continuing an- 
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tagonist X'"''^. Again, we observe abs(A„) being clearly 
located above the random frontier for all sub-periods. 
The movement of abs(Ai) is less volatile. Closer investi- 
gation of the underlying eigenvalues for Q = 10 revealed 
changing participation of the sectors (measured by the 
quantity Isi as defined in Eq. I|23|l l. In effect, for all of 
the 11 sub-periods either the Energy (in periods 6-9) or 
the Utilities sector (in periods 3, 5) appeared as primar- 
ily contributing. In the rest of the periods, both of these 
sectors were represented strongly in Isi ■ 

The last question addressed in this analysis is about 
the correlations of the lagged correlation matrices: Are 
significant lagged correlations only found a posteriori or 
does the data indicate a possibility for a reasonable pre- 
diction of future lead-lag structures? To this end we 
calculate the correlation of matrix elements between the 
lagged correlation matrices obtained from different (non- 
overlapping) observation periods T„ and T^, 

((cy(r„) - (cy(r„)),,-)(C'y(J^nO ~ (c?(t„)).,,)).,- 

(It 

(25) 

Here, the average extends over all matrix-elements and 
aT„ denotes the standard deviation of matrix Ci(T„). 
Figure ^2 depicts the characteristics we obtained from 
empirical data. While the expected band of correlation- 
coefficients would be bound by very small values (in the 
order of 1/400), we find extremely significant correla- 
tions, especially for the Q = 10 case. As expected, 
the 'predictability' of future weighted lead-lag matri- 
ces is significantly higher for lagged matrices calculated 
over longer sub-periods. The inset of Figure 1111 shows 
^'^^{TnjTjn), i.e. the same quantity calculated for the 
residual data. Overall correlations are lower in this case, 
meaning nothing else than that the market-wide move- 
ments exhibit predictable lead-lag structures. However, 
note that for Q=10 the fluctuations of abs(A„) depicted 
m the inset of Fig. HHI are not mirrored by any specific 
variation of c(Ti, Ti + d) in Fig. [TTl 

Although the present analysis of time-dependence is 
not comprehensive in every respect, we may state that 
non-random structures prevail to quite low information- 
to-noise ratios and that a significant amount of lagged 
correlation matrices is predictable for future periods. 
However, shortening the length of the sub-periods results 
in decreasing predictability. 

5. CONCLUSION 

We have applied random matrix theory to lagged cross- 
correlation matrices and theoretically derived the eigen- 
value spectra emanating from the respective real asym- 
metric random matrices in dependence of the informa- 
tion to noise ratio, Q. Specifically, we have shown that 
- in the case of any eigenvalue 'gas' satisfying circular 
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FIG. 11: Matrix element correlation c{Ti,Ti+d) as described 
in Eq. I25II for various time lags d for the original data and 
for the residuals X"'*'^. 



symmetry - an inverse Abel-transform can be used to re- 
construct the radial density, p{r), from rescaled projec- 
tions available via solutions of the symmetrized problem. 
Based on these theoretical results, we analyzed empirical 
cross-correlations of 5 min returns of the S&P500. For 
the full time-period observed, we found remarkable devi- 
ations from the prediction of the efficient market hypoth- 
esis and discussed various structural properties of these 
deviations. We found the largest eigenvalue being as- 
sociated with a sub-matrix of exponentially distributed 
entries. This eigenvalue was associated with a strong 
hub-like leading influence of the IT sector. Analyzing 
data based on the residuals of a regression to common 
movements, we found that cluster structure in the lead- 
lag network is strongly enhanced. Looking at lagged cor- 
relation matrices pertaining to sub-periods of the overall 
investigation period we found that deviations from the 
theoretical prediction do occur at quite low information 
to noise ratios. We also found that signiflcant parts of 
the lagged correlation matrix should be predictable via 
measurements of past (non-overlapping) periods. 

We think that the current work can be extended in 
various directions. On the theoretical side, a closer in- 
vestigation of the nature finite-size effects in the ensemble 
of time-lagged correlation matrices and comparison with 
the exact finite-size result of the random real asymmetric 
case would be tempting. Finite-size effects could also 
be inferred from the terms which were found to vanish 
in the N ^ oo limit in Appendix A. We also think that 
some work is needed in an exact understanding of the re- 
lation between the eigenvalue spectra (including the left 
and right eigenvectors of the ensemble discussed here) 
and the singular value decomposition of related problems 
|36j | . Also a rigorous study of a 'cleaning procedure' along 
the lines of methods already worked out for equal-time 



14 



financial covariance matrices could be pursued as well. 

Finally we believe that the presented work - in general 
- should allow for an eigenvalue-dependent, systematic 
study of the influence of matrices and their interplay with 
equal time-correlations between financial assets in con- 
crete models. The fact that cluster structure conforming 
with market sectors can be found in lagged correlation 
matrices already indicates the direction of findings to be 
expected from such work. 



The typical situation for higher order terms is similar to 
the one for the second order term, i.e. the terms in r gen- 
erally depend on some function of Q and the 'dangerous' 
terms (like [x^ — y^Y') vanish since they remain constant 
for growing matrix size and are thus neutralized by the 
prefactor 1/iV. We do not expect any different behavior 
for terms higher than fourth order. 

Appendix B 
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Appendix A 

Based on the series expansion (|11|) of the potential 0, 
we have calculated the first four terms in the series. For 
the first term, one easily obtains 



^lhn^l(Tr(B)>. 



(26) 

since all other terms vanish as Tr(C) gives just N times 
the averages of the autocorrelation of the assumed iid 
white noise process. For calculating the second term, it 
is useful to remember Tr(AB) = Tr(BA) and Tr(CC) = 
Tr(C-^C^) as well as taking into account that odd powers 
of C vanish. One then arrives at 



Jhn^l(Tr(i.^)). 



= lim l(Tr(((C^^C^-')2),; 

N-^oo iV 

+ (a;2-y2)Tr((2C*^a'^)c)) 



(27) 



This structure is also typical for higher order terms (not 
shown for brevity). The trace in the 'dangerous' term 
proportional to — is nothing else than N times the 
variance of autocorrelations which is just 1 /T for a Gaus- 
sian process. Thus, in total, the term vanishes as 1/T in 
the limit N ^ oo with Q = const., and one gets 



lim ^TT{{B'')), = K + 2rQ- 

N^oc TV 



(28) 



In very similar calculations, it is easy (but tedious), to 
check that 

1{Tt{B')) ^ fir) and ^ (Tr (B^ )) ^ gir) . (29) 



The uniform eigenvalue distribution of real asymmetric 
matrices in the complex plane C found in can be 
almost trivially recovered from Wigner's semicircle law 
of real symmetric matrices via application of the inverse 
Abel-transform. Starting from Wigner's semicircle law 

P(A) 



and after proper rescaling Px(A) 



-^^V4 — 2A^ we may insert into Eq. (|16|l and arrive at 



\/2 



V2~A2 _ ^2 



zdA 



-arctan 



V2 



(30) 



1 

2^ 



We immediately arrive at the result of an uniform eigen- 
value distribution, 



p{r) 



^ 0<r<V2 
elsewhere 



(31) 



Appendix C 

For (5 = 1, one solution can be written in the form 



(32) 



Note, that this equation shows a simple relation to the 
resolvent of the Gaussian orthogonal ensemble {Gq^^ = 



^G^^^ {z}). The eigenvalue spectrum following from 
Eqs. H14|l and (|32|l can then be written as 



1 V 2 + 2+\X\ 



V2tt 



2+|A| 



1 1 



V27rY 2 |A| 2+|A| |A|(2 + |A|) ' 

(33) 

and is valued on the support [—2, 2]. After proper rescal- 
ing and taking an expression equivalent to Eq. IjlGfl. 
namely 



7rr dr 



VA2 - r2 



(34) 
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we end up with the expression 



which can be evaluated to 



TT^r dr 



(35) 



.2V4r (_\ 



V 4 



r f - 1 $1 



1 3 1 A_2^^ 
4' 4' 2' 2 



where = 6\/ tt^t^, r(a;) denotes the Gamma-Function 
and $2 {(^1 ^) 2;) is the hypergeometric function. It can 

be checked, that - of course - /J^27rrp(r)dr = 1. 
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